Author: Charles Tapley Hoyt
Estimated Runtime:
Often in BEL documents, complexes are named by their members. Additional information might be avaliable from other sources, so these complexes should be resolved to their external names if possible.
In this investigation, the human 9-1-1 Complex is resolved while other complexes without names are left.
In [1]:
import itertools as itt
import math
import time
import os
import pybel
import networkx as nx
from pybel.constants import *
import matplotlib.pyplot as plt
import pybel_tools as pbt
from pybel_tools.visualization import to_jupyter
%matplotlib inline
In [2]:
pybel.__version__
Out[2]:
In [3]:
pbt.__version__
Out[3]:
In [4]:
time.asctime()
Out[4]:
In [5]:
manager = pybel.manager.CacheManager()
In [6]:
url = 'http://resources.openbel.org/belframework/20150611/resource/named-complexes.bel'
graph = pybel.from_url(url, manager=manager)
In [7]:
graph.number_of_nodes(), graph.number_of_edges()
Out[7]:
In [8]:
plt.title('Size Rank Distribution of Named Complexes')
plt.xlabel('Rank')
plt.ylabel('Size')
plt.plot(sorted([graph.degree(c) for c in graph], reverse=True))
plt.show()
During parsing and compiling, PyBEL transforms each complex into a canonical tuple that consists of the names of each of its members. The prior knowledge graph contains orthologous proteins for each complex, so the members are filtered to match human proteins with HGNC identifiers, rather than MGI or RDB. A mapping is built from the canonical tuple to its name.
In [9]:
# build mapping
t2n = {}
relabel_mapping = {}
for node in pbt.filters.get_nodes_by_function(graph, COMPLEX):
data = graph.node[node]
members = [
member
for member in graph.edge[node]
if graph.node[member][NAMESPACE] == 'HGNC'
]
t = (COMPLEX,) + tuple(sorted(members))
t2n[t] = data
relabel_mapping[t] = node
In [10]:
test_bel = """SET DOCUMENT Name = "PyBEL Test Document"
SET DOCUMENT Description = "Made for testing PyBEL parsing"
SET DOCUMENT Version = "1.6.0"
SET DOCUMENT Copyright = "Copyright (c) Charles Tapley Hoyt. All Rights Reserved."
SET DOCUMENT Authors = "Charles Tapley Hoyt"
SET DOCUMENT Licenses = "Other / Proprietary"
SET DOCUMENT ContactInfo = "charles.hoyt@scai.fraunhofer.de"
DEFINE NAMESPACE ChEBI AS URL "http://resources.openbel.org/belframework/20150611/namespace/chebi.belns"
DEFINE NAMESPACE HGNC AS URL "http://resources.openbel.org/belframework/20150611/namespace/hgnc-human-genes.belns"
SET Citation = {"Other","Test Title","123456"}
SET Evidence = "Evidence 1"
complex(p(HGNC:HUS1), p(HGNC:RAD1), p(HGNC:RAD9A)) increases a(ChEBI:"oxygen radical")
complex(p(HGNC:SFN), p(HGNC:YWHAB)) cnc a(ChEBI:"oxygen radical")"""
target_graph = pybel.from_lines(test_bel.split('\n'), manager=manager)
In [11]:
to_jupyter(target_graph, replace_cnames=True)
Out[11]:
Finally, the complexes are quickly matched with the mapping dictionary. The appropriate annotations are added, and the node is renamed. The network is shown again with the resolved nodes.
In [12]:
# fix node data
for c in pbt.filters.get_nodes_by_function(target_graph, COMPLEX):
if c in t2n:
print('matched')
target_graph.node[c].update(t2n[c])
In [13]:
to_jupyter(target_graph, replace_cnames=True)
Out[13]: